Demographics of sources of HIV-1 transmission in Zambia: a molecular epidemiology analysis in the HPTN 071 PopART study

Summary Background In the last decade, universally available antiretroviral therapy (ART) has led to greatly improved health and survival of people living with HIV in sub-Saharan Africa, but new infections continue to appear. The design of effective prevention strategies requires the demographic characterisation of individuals acting as sources of infection, which is the aim of this study. Methods Between 2014 and 2018, the HPTN 071 PopART study was conducted to quantify the public health benefits of ART. Viral samples from 7124 study participants in Zambia were deep-sequenced as part of HPTN 071-02 PopART Phylogenetics, an ancillary study. We used these sequences to identify likely transmission pairs. After demographic weighting of the recipients in these pairs to match the overall HIV-positive population, we analysed the demographic characteristics of the sources to better understand transmission in the general population. Findings We identified a total of 300 likely transmission pairs. 178 (59·4%) were male to female, with 130 (95% CI 110–150; 43·3%) from males aged 25–40 years. Overall, men transmitted 2·09-fold (2·06–2·29) more infections per capita than women, a ratio peaking at 5·87 (2·78–15·8) in the 35–39 years source age group. 40 (26–57; 13·2%) transmissions linked individuals from different communities in the trial. Of 288 sources with recorded information on drug resistance mutations, 52 (38–69; 18·1%) carried viruses resistant to first-line ART. Interpretation HIV-1 transmission in the HPTN 071 study communities comes from a wide range of age and sex groups, and there is no outsized contribution to new infections from importation or drug resistance mutations. Men aged 25–39 years, underserved by current treatment and prevention services, should be prioritised for HIV testing and ART. Funding National Institute of Allergy and Infectious Diseases, US President's Emergency Plan for AIDS Relief, International Initiative for Impact Evaluation, Bill & Melinda Gates Foundation, National Institute on Drug Abuse, and National Institute of Mental Health.

: Sensitivity analysis results 14. Table S4: Command line options for HIV-TRACE, IQ-TREE, and phyloscanner 15.Table S5: REGA subtyping results for individuals in transmission pairs 16. References

Further sequencing details
Plasma samples used for this study were derived from both residual EDTA treated blood collected for CD4 quantification, centrifuged at the local health care facility and then transported frozen to a central lab in Lusaka then shipped to Oxford for extraction and onward processing.Full details of the sequencing methods and bioinformatics have previously been published, along with a validation against a clinically accredited HIV drug resistance assay and HIV viral load assay 1 , which demonstrated robust whole genome characterisations for samples with viral loads > 5000 RNA copies per ml.Water controls and negative plasma controls are included in each batch of 96 samples to monitor for physical contamination, along with a standard curve (a ten-fold serial dilution of a plasma-diluted culture stock of HXB2) for quantitative calibrations and additional monitoring for cross-contamination.Unique dual indexing allowed us to monitor for, and exclude, reads with unexpected index pairs that can arise because of index hopping or overly-dense optical clustering on the flow-cell.
Total RNA was extracted with magnetized silica from HIV-infected plasma lysed with guanidine thiocyanate and with ethanol washes and elution steps performed using the NUCLISENS easyMAG system (bioMérieux).The total 30 μl elution volume was reduced with Agencourt RNAClean XP (Beckman Coulter).Libraries retaining directionality were prepared using the SMARTer Stranded Total RNA-Seq kit v2 -Pico Input Mammalian (Clontech, TaKaRa Bio).Dual-indexed amplified cDNA libraries were carried out using in-house sets of 96 i7 and 96 i5 indexed primers.Details of the primers can be found elsewhere 2 .Equal volumes of each amplified library were pooled in 96-plex.A total of 500 ng of pooled libraries was hybridized (SeqCap EZ reagent kit, Roche) to a mixture of custom HIV-specific biotinylated 120-mer oligonucleotides (xGen Lockdown Probes, Integrated DNA Technologies), then pulled down with streptavidin-conjugated beads.Unbound DNA was washed off the beads (SeqCap EZ hybridization and wash kit, Roche), and the captured libraries were PCR amplified to produce the final pool for sequencing using a MiSeq (Illumina) instrument with v3 chemistry for a read length up to 300 nt paired-end.Alternatively, up to 384 samples were sequenced on HiSeq 2500 set to Rapid run mode using HiSeq Rapid SBS kit v2 with maximum read lengths of 250 nt.

Consensus phylogeny
The consensus sequences output by shiver were each pairwise aligned to the subtype C reference genome used in phyloscanner (NCBI accession number AF443088.1) to generate a consensus alignment.A simian immunodeficiency virus sequence (NC_004455.1)was included to represent an outgroup.The consensus phylogeny in figure 2 was generated for this alignment using IQ-TREE 1.6.12 3 and the FreeRate nucleotide substitution model with four rate categories.

Phyloscanner procedure
We used a set of 898 genomic windows, each of 250 base pairs in length, spaced at regular intervals to span the entire HIV-1 genome.Alignment of the reads intersecting each window, together with a set of 22 reference sequences, was performed with phyloscanner, and phylogenetic reconstruction performed on each alignment using IQ-TREE 1.6.12 4 and the FreeRate nucleotide substitution model with six rate categories.For a full list of command-line options used in HIV-TRACE, phyloscanner and IQ-TREE, see Table S4.Blacklisting, with a k parameter of 15, was performed to eliminate likely contaminants reads from the phylogenies.The 22 reference sequences were the standard references from phyloscanner for HIV-1 types B and C, together with a random sample of 20 PopART consensus sequences.
Likely transmission pairs were identified from the 898 phylogenies where their subtrees lay within a normalised patristic distance threshold of 0•02 substitutions per site in at least 50% of windows, after adjusting for windows with missing coverage as in 5 .Topological determination of the direction of transmission also used phyloscanner.Directionality was called if it was established in at least 33% of windows (adjusted for missing windows as before).
In a small number of cases where participants were reconstructed as the probable recipient of more than one transmission, the source was taken as the one with the greatest number of phyloscanner windows with normalised patristic distance less than 0•02.

Reconstruction of direction of transmission by estimated time of infection
Phylo-TSI gives both a point estimate for the date of infection and a standard deviation of the estimate; these were used to fit normal distributions.Directionality was called if the distributions for the two infection date estimates had an overlap of less than 20% area under the curve.

Power calculation for transmission pairs
The power calculation outlined in the PopART Phylogenetics study protocol (p55-57 of 6 ) predicted that a total of 269 transmission pairs from incident cases would be identifiable, under the assumption that of all the transmission pairs for which both individuals provided sequences, 75% could be identified as pairs by the analysis.In the event, we identified 300 such pairs.This is despite the dataset consisting of 6,865 sequences, considerably lower than the protocol's prediction of the acquisition of 9,156 (p31).

Detection and classification of drug resistance mutations (further details)
A bioinformatic pipeline, drmSEQ, was used to predict drug resistance to first-line adult ART based on detection of mutations in the Illumina reads generated by veSEQ-HIV using the Stanford HIV Drug Resistance Database scoring system (HIVdb version 8.9.1) 7 as follows: wild type/susceptible for scores 0-14 -, low-level resistance for scores 15-29, and high-level resistance for scores 30 and above.The method has been previously validated against an FDA-approved drug resistance assay 1 .First-line adult ART national guidance at the time of sampling was non-nucleoside inhibitor efavirenz in combination with nucleoside inhibitors including abacavir, AZT, D4T, DDI, FTC, 3TC or Tenofovir.
Mutations were reported when detected in three or more PCR-deduplicated reads and 5% or more reads spanning each site.Drug Resistance was reported as 'unknown' when fewer than 50% of sites relevant to each drug reached the minimum coverage threshold of 3 or more PCR-deduplicated reads.

Calculation of relative transmission rates
The relative contribution of demographic groups to transmission, compared to their share of the overall population, was determined as follows.Suppose represents an age group and a sex.(PopART-IBM age groups are in five-year increments except that there exists a 13-14 group.We combined this with the 15-19 category to give a 13-19 group, and also combined all age groups over 50 together into one category.)We calculated , the proportion of all HIV+ individuals in 2017 belonging to and according to the IBM, and , the proportion of our pairs, weighted as described above, where the source belonged to and the recipient belonged to in July 2017 (regardless of their estimated date of infection).We then calculated: This represents the relative rate of transmissions coming from the combination of and .It is above 1 if that demographic category is overrepresented amongst sources, and less than 1 if it is underrepresented.
We can also calculate across all age groups to get the statistics for all females and for all males.Then represents the ratio of the number of male sources per infected male to the number of female sources per infected female.This ratio can also be calculated individually in each age group.
For we used the IBM-derived HIV+ population.As the vast majority of new infections will come from people not currently on ART, an alternative statistic is , representing the proportion of HIV positive individuals not currently on ART who are in both and .The relative rate using as a denominator, , is calculated in the same way.
To derive confidence intervals for these statistics, we performed a non-parametric bootstrap by sampling the set of observed pairs 200 times with replacement, re-weighting this new dataset by iterative proportional fitting, and calculating again.The limits of the confidence intervals were recorded as the 0.025 and 0.975 quantiles of the values of each statistic.S1.Extended participants table.HIV-positive study participants, sequence availability, and inferred sources and recipients in probable transmission pairs, by arm of the study.HCF = health care facilities, PC seroconverters = participants in the population cohort who seroconverted during the trial period.PC0, PC12, PC24 = population cohort participants, HIV positive at baseline, recruited at the start of the trial (PC0) after 12 months (PC12N) and after 24 months (PC24N).

Recruitment
Table S3.Sensitivity analyses.Key quantities from the main analysis and from the three sensitivity analyses.All quantities apart from pair counts are demographically weighted.

Figure S1 .
Figure S1.Extended version of figure 4. Actual distribution of risk factors compared to the distribution expected if they had been allocated randomly to sources based on risk factor percentages identified in the source population.

Figure S2 :
Figure S2: Relative transmission rates calculated from the sensitivity analyses.In the first two rows, the the ratio of the proportion of sources in an age group that are female (top) or male (bottom) to the proportion in the same group of (left) all HIV+ individuals or (right) all HIV+ individuals not on ART.Ages are calculated as of July 2017.We identified no male sources in the 13-19 age group and thus this estimate is omitted.Line and point colours represent the main analysis and the three sensitivity analyses.The bottom graphs depict the Ratios of the relative contributions of male and female sources by age group.

Figure S3 :
Figure S3: Histograms of estimated times from infection to sampling.(A) All 5,612 included participants.B) The 300 recipients in the reconstructed transmission pairs.A

Table S2 . Demographic and other characteristics of all eligible participants, and sources and recipients in the reconstructed transmission pairs, by trial arm and overall
. A total of 300 pairs were found, but 14 sources had multiple probable recipients.

Table S4 : Command line options for HIV-TRACE, IQ-TREE, and the two phyloscanner commands.
For further details, see the relevant package manuals.